Skip to content

Conversation

@sol1105
Copy link
Contributor

@sol1105 sol1105 commented Aug 8, 2025

This is a more lightweight approach to what I tried to achieve in #317

When remapping to a larger domain with the method nearest_s2d, data points outside the original domain will - by the nature of this method - be extrapolated, i. e. have values from the edge of the original domain.

This PR introduces a Regridder option I labeled post_mask_source:

        post_mask_source : str or array-like, optional
            Optionally applies a post-processing step to remove selected source grid cells from
            contributing to the regridding weight matrix.

            Note: This differs from the typical masking approach, which prevents source cells from
            being used during weight generation. Here, the regridding weights are modified *after*
            creation to remove the contribution of specified source grid cells.

            Options:

            - If set to ``"domain_edge"``, the outermost edge cells of the source grid are
              automatically detected and their contribution to the regridding weights is removed.
              This is useful to avoid extrapolation beyond the domain boundary when using the
              nearest-neighbor method ``'nearest_s2d'``, particularly when remapping from a smaller
              to a larger domain (as is common with regional source grids like CORDEX).
              Only supported for ``Grid`` type ESMF objects as source grid.

            - If an array-like of integers is provided, it is interpreted as flat indices
              (i.e., 1D indices of the flattened source grid) identifying source cells
              whose contribution to the regridding weights should be removed.

            Default is ``None``, meaning no post-weight-generation source grid cell masking is applied.

This is useful when remapping data on regional grids such as CORDEX data. A demonstration can be seen here at the bottom of Curvilinear_grid.ipynb:
https://nbviewer.org/github/sol1105/xESMF/blob/domain_mask_nearest/doc/notebooks/Curvilinear_grid.ipynb#Undesired-extrapolation

I added it as an option to the Regridder and BaseRegridder and re-arranged some of the masking steps for the parallel option to make it compatible. Beside the CORDEX use case (post_mask_source="domain_edge"), I allowed to specify arbitrary source cells to be masked as well, while I do not have a use case for this on the top of my head. I did not lock this option to nearest_s2d only, but also here I do not have a use case in mind.

From #317 I added the function to create a mask from remapping weights to smm.py. I added an example to the documentation for that as well.

Example:

Source:
image

Before:
image

After (post_mask_source="domain_edge"):
image

sol1105 added 4 commits August 7, 2025 17:23
…t_s2d method

* BaseRegridder / Regridder:
  - Adding new argument 'post_mask_source'
* smm module:
  - new function 'mask_source_indices' altering the weights by
    removing the contribution of selected source grid cells
  - new function 'gen_mask_from_weights' allowing to generate
    a 2D binary mask out of regridding weights
* util module:
  - new function _get_edge_indices_2d, returning array of the indices
    of cells at the edges of a (raveled) 2D grid
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Collaborator

@aulemahal aulemahal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This looks like a great addition!

Could you add a section to the changelog ? I think you can start a new 0.9.0 (unreleased) section.

@sol1105
Copy link
Contributor Author

sol1105 commented Sep 9, 2025

@aulemahal Thanks for the review. I updated the changelog for both PRs, but it seems cf_xarray 0.10.8 introduced an issue in an internal function called by cf_xarray.helpers.bounds_to_vertices, causing now a test to fail (test_compare_weights_from_poly_and_grid). I pinned cf_xarray to <0.10.8 and the tests are successful. I created an issue for cf_xarray (xarray-contrib/cf-xarray#593).

If it is ok with you, could you merge any of the two PRs into master, and then I would resolve the merge conflicts in the other PR, as I have already resolved those locally?

Here is the error traceback with cf_xarray 0.10.8:

    @pytest.mark.filterwarnings('ignore:`polys` contains large')
    def test_compare_weights_from_poly_and_grid():
        """Confirm that the weights are identical when they are computed from a grid->grid and grid->poly."""
    
        # Global grid
        ds = xe.util.grid_global(20, 12, cf=True)
    
        # A single destination tile
        tile = xe.util.cf_grid_2d(-40, -80, -40, 0, 80, 80)
        ds['a'] = xr.DataArray(
            np.ones((ds.lon.size, ds.lat.size)),
            coords={'lat': ds.lat, 'lon': ds.lon},
            dims=('lon', 'lat'),
        )
    
        # Create polygon from tile corners
        x1, x2 = tile.lon_bounds.isel(lon=0)
        y1, y2 = tile.lat_bounds.isel(lat=0)
        poly = Polygon([(x1, y1), (x2, y1), (x2, y2), (x1, y2)])
    
        # Regrid using two identical destination grids (in theory)
>       rgrid = xe.Regridder(ds, tile, method='conservative')
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

xesmf/tests/test_frontend.py:911: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
xesmf/frontend.py:940: in __init__
    grid_out, shape_out, output_dims = ds_to_ESMFgrid(ds_out, need_bounds=need_bounds)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
xesmf/frontend.py:168: in ds_to_ESMFgrid
    lon_b, lat_b = _get_lon_lat_bounds(ds)
                   ^^^^^^^^^^^^^^^^^^^^^^^
xesmf/frontend.py:110: in _get_lon_lat_bounds
    lon_b = cfxr.bounds_to_vertices(lon_bnds, ds.cf.get_bounds_dim_name('longitude'), order=None)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
cf_xarray/helpers.py:182: in bounds_to_vertices
    core_dim_orders = _get_core_dim_orders(core_dim_coords)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

core_dim_coords = {'lon': array([-60.])}

    def _get_core_dim_orders(core_dim_coords: dict[str, np.ndarray]) -> dict[str, str]:
        """
        Determine the order (ascending, descending, or mixed) of each core dimension
        based on its coordinates.
    
        Repeated (equal) coordinates are ignored when determining the order. If all
        coordinates are equal, the order is treated as "ascending".
    
        Parameters
        ----------
        core_dim_coords : dict of str to np.ndarray
            A dictionary mapping dimension names to their coordinate arrays.
    
        Returns
        -------
        core_dim_orders : dict of str to str
            A dictionary mapping each dimension name to a string indicating the order:
            - "ascending": strictly increasing (ignoring repeated values)
            - "descending": strictly decreasing (ignoring repeated values)
            - "mixed": neither strictly increasing nor decreasing (ignoring repeated values)
        """
        core_dim_orders = {}
    
        for dim, coords in core_dim_coords.items():
            diffs = np.diff(coords)
    
            # Handle datetime64 and timedelta64 safely for both numpy 1.26.4 and numpy 2
            if np.issubdtype(coords.dtype, np.datetime64) or np.issubdtype(
                coords.dtype, np.timedelta64
            ):
                # Cast to float64 for safe comparison
                diffs_float = diffs.astype("float64")
                nonzero_diffs = diffs_float[diffs_float != 0]
>           elif isinstance(diffs[0], datetime.timedelta):
                            ^^^^^^^^
E           IndexError: index 0 is out of bounds for axis 0 with size 0

cf_xarray/helpers.py:234: IndexError

@aulemahal
Copy link
Collaborator

Indeed, when reviewing your PR I noticed the cf-xarray error. I opened #453 to skip the failing test. I think the buggy case is narrow enough, we didn't need to pin cf-xarray in the dependencies. Doing it like you did in the CI is another similar solution. I'll try to merge my PR first and then both of yours, removing the pin.

@sol1105 sol1105 requested a review from aulemahal October 15, 2025 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants